23 research outputs found

    On the information theory of clustering, registration, and blockchains

    Get PDF
    Progress in data science depends on the collection and storage of large volumes of reliable data, efficient and consistent inference based on this data, and trusting such computations made by untrusted peers. Information theory provides the means to analyze statistical inference algorithms, inspires the design of statistically consistent learning algorithms, and informs the design of large-scale systems for information storage and sharing. In this thesis, we focus on the problems of reliability, universality, integrity, trust, and provenance in data storage, distributed computing, and information processing algorithms and develop technical solutions and mathematical insights using information-theoretic tools. In unsupervised information processing we consider the problems of data clustering and image registration. In particular, we evaluate the performance of the max mutual information method for image registration by studying its error exponent and prove its universal asymptotic optimality. We further extend this to design the max multiinformation method for universal multi-image registration and prove its universal asymptotic optimality. We then evaluate the non-asymptotic performance of image registration to understand the effects of the properties of the image transformations and the channel noise on the algorithms. In data clustering we study the problem of independence clustering of sources using multivariate information functionals. In particular, we define consistent image clustering algorithms using the cluster information, and define a new multivariate information functional called illum information that inspires other independence clustering methods. We also consider the problem of clustering objects based on labels provided by temporary and long-term workers in a crowdsourcing platform. Here we define budget-optimal universal clustering algorithms using distributional identicality and temporal dependence in the responses of workers. For the problem of reliable data storage, we consider the use of blockchain systems, and design secure distributed storage codes to reduce the cost of cold storage of blockchain ledgers. Additionally, we use dynamic zone allocation strategies to enhance the integrity and confidentiality of these systems, and frame optimization problems for designing codes applicable for cloud storage and data insurance. Finally, for the problem of establishing trust in computations over untrusting peer-to-peer networks, we develop a large-scale blockchain system by defining the validation protocols and compression scheme to facilitate an efficient audit of computations that can be shared in a trusted manner across peers over the immutable blockchain ledger. We evaluate the system over some simple synthetic computational experiments and highlights its capacity in identifying anomalous computations and enhancing computational integrity

    Beliefs and expertise in sequential decision making

    Full text link
    This work explores a sequential decision making problem with agents having diverse expertise and mismatched beliefs. We consider an N-agent sequential binary hypothesis test in which each agent sequentially makes a decision based not only on a private observation, but also on previous agents’ decisions. In addition, the agents have their own beliefs instead of the true prior, and have varying expertise in terms of the noise variance in the private signal. We focus on the risk of the last-acting agent, where precedent agents are selfish. Thus, we call this advisor(s)-advisee sequential decision making. We first derive the optimal decision rule by recursive belief update and conclude, counterintuitively, that beliefs deviating from the true prior could be optimal in this setting. The impact of diverse noise levels (which means diverse expertise levels) in the two-agent case is also considered and the analytical properties of the optimal belief curves are given. These curves, for certain cases, resemble probability weighting functions from cumulative prospect theory, and so we also discuss the choice of Prelec weighting functions as an approximation for the optimal beliefs, and the possible psychophysical optimality of human beliefs. Next, we consider an advisor selection problem where in the advisee of a certain belief chooses an advisor from a set of candidates with varying beliefs. We characterize the decision region for choosing such an advisor and argue that an advisee with beliefs varying from the true prior often ends up selecting a suboptimal advisor, indicating the need for a social planner. We close with a discussion on the implications of the study toward designing artificial intelligence systems for augmenting human intelligence.https://arxiv.org/abs/1812.04419First author draf

    Beliefs in Decision-Making Cascades

    Full text link
    This work explores a social learning problem with agents having nonidentical noise variances and mismatched beliefs. We consider an NN-agent binary hypothesis test in which each agent sequentially makes a decision based not only on a private observation, but also on preceding agents' decisions. In addition, the agents have their own beliefs instead of the true prior, and have nonidentical noise variances in the private signal. We focus on the Bayes risk of the last agent, where preceding agents are selfish. We first derive the optimal decision rule by recursive belief update and conclude, counterintuitively, that beliefs deviating from the true prior could be optimal in this setting. The effect of nonidentical noise levels in the two-agent case is also considered and analytical properties of the optimal belief curves are given. Next, we consider a predecessor selection problem wherein the subsequent agent of a certain belief chooses a predecessor from a set of candidates with varying beliefs. We characterize the decision region for choosing such a predecessor and argue that a subsequent agent with beliefs varying from the true prior often ends up selecting a suboptimal predecessor, indicating the need for a social planner. Lastly, we discuss an augmented intelligence design problem that uses a model of human behavior from cumulative prospect theory and investigate its near-optimality and suboptimality.Comment: final version, to appear in IEEE Transactions on Signal Processin

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Nations within a nation: variations in epidemiological transition across the states of India, 1990–2016 in the Global Burden of Disease Study

    Get PDF
    18% of the world's population lives in India, and many states of India have populations similar to those of large countries. Action to effectively improve population health in India requires availability of reliable and comprehensive state-level estimates of disease burden and risk factors over time. Such comprehensive estimates have not been available so far for all major diseases and risk factors. Thus, we aimed to estimate the disease burden and risk factors in every state of India as part of the Global Burden of Disease (GBD) Study 2016

    Decision Making in Star Networks with Incorrect Beliefs

    Full text link
    Consider a Bayesian binary decision-making problem in star networks, where local agents make selfish decisions independently, and a fusion agent makes a final decision based on aggregated decisions and its own private signal. In particular, we assume all agents have private beliefs for the true prior probability, based on which they perform Bayesian decision making. We focus on the Bayes risk of the fusion agent and counterintuitively find that incorrect beliefs could achieve a smaller risk than that when agents know the true prior. It is of independent interest for sociotechnical system design that the optimal beliefs of local agents resemble human probability reweighting models from cumulative prospect theory. We also consider asymptotic characterization of the optimal beliefs and fusion agent's risk in the number of local agents. We find that the optimal risk of the fusion agent converges to zero exponentially fast as the number of local agents grows. Furthermore, having an identical constant belief is asymptotically optimal in the sense of the risk exponent. For additive Gaussian noise, the optimal belief turns out to be a simple function of only error costs and the risk exponent can be explicitly characterized.Comment: final version, to appear in IEEE Transactions on Signal Processin

    Downlink Resource Allocation Under Time-Varying Interference: Fairness and Throughput Optimality

    No full text
    corecore